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Parametric Bilinear Generalized Approximate 
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Jason T. Parker and Philip Schniter 


Abstract —We propose a scheme to estimate the parameters 
bi and c, of the bilinear form z m = Tli.j bi z m^ Cj from noisy 
measurements {t/ m }*' = i, where y m and z m are related through 
an arbitrary likelihood function and are known. Our 

scheme is based on generalized approximate message passing 
(G-AMP): it treats bi and c, as random variables and z™’ 1 as 
an i.i.d. Gaussian 3-way tensor in order to derive a tractable 
simplification of the sum-product algorithm in the large-system 
limit. It generalizes previous instances of bilinear G-AMP, such 
as those that estimate matrices B and C from a noisy measure¬ 
ment of Z = BC, allowing the application of AMP methods 
to problems such as self-calibration, blind deconvolution, and 
matrix compressive sensing. Numerical experiments confirm the 
accuracy and computational efficiency of the proposed approach. 

Index Terms —Approximate message passing, belief propaga¬ 
tion, bilinear estimation, blind deconvolution, self calibration, 
joint channel-symbol estimation, matrix compressive sensing. 

I. Introduction 

A. Motivation 

Many problems in engineering, science, and finance can 
be formulated as the estimation of a structured matrix Z £ 
jgiA/xL f rom a no i S y ( or otherwise corrupted) observation 
Y £ R MxL . For various types of structure, the problem 
reduces to a well-known specialized problem. For example, 
when Z has a low-rank structure and only a subset of its 
entries are observed (possibly in noise), the estimation of Z 
is known as matrix completion (MC) |2]. When Z = L + S 
for low-rank L and sparse S, the estimation of L and S 
from a (noisy) observation of Z is known as robust principal 
components analysis (RPCA) a, a or stable principle com¬ 
ponents pursuit (SPCP) 0. When Z = BC with sparse C, 
the problem of estimating B and C from a (noisy) observation 
of Z is known as dictionary learning (DL) 0. When Z = BC 
and both B and C are positive, the problem of estimating 

B, C from a (noisy) observation of Z is known as nonnegative 
matrix factorization (NMF) |7|. 

In this paper, we propose an AMP-based approach to a more 
general class of structured-matrix estimation problems. Our 
work is motivated by problems like the following. 
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1) Estimate b and C from a noisy observation of] 

Z = Diag (Hb)AC (1) 

with known H and A. This problem manifests, e.g., in 

• Self-calibration 0. Here the columns of C are measured 
through a linear system, represented by the matrix A, 
whose outputs are subject to unknown (but structured) 
gains of the form Hb. The goal is to simultaneously 
recover the signal C and the calibration parameters b. 

• Blind circular deconvolution: Here the columns of C are 
circularly convolved with the channel 6, and the goal is 
to simultaneously recover C and b from a noisy version 
of the Fourier-domain convolution outputs]! 

2) Consider the more generaf] problem of estimating { h ,} and 

C from a noisy observation of 

Z = ^b i A {i) C (2) 

i 

with known { A : l> }. This problem manifests, e.g., in 

• Compressive sensing with matrix uncertainty 0. Here, 
Z = AC where A = biA W is an unknown (but 
structured) sensing matrix and the columns of C £ 
S. NxL are sparse signals. The goal is to simultaneously 
recover C and the matrix uncertainty parameters {bi}. 

• Joint channel-symbol estimation. Say a symbol stream 

{d} is transmitted through a length-A), convolutive 
channel {bi}, where the same length-iV fl > A), — 
1 guard interval is repeated every N p samples in 
{d}. Then the noiseless convolution outputs can 
be written as Z = where A^ = 

[°n p x(n 9 -hi) i *p °w pX (i-i) ] and where the first and last 
N g rows in C are guard symbols. The goal is to jointly 
estimate the channel {bi} and the (finite-alphabet) data 
symbols in C. 

3) Consider the yet more generaf] problem of estimating low- 

rank L and sparse S from noisy observations of 

z m = tr {&m(L + S')} for TO = 1,..., N z (3) 

1 For clarity, we typeset matrices in bold capital, vectors in bold lowercase, 
and scalars in non-bold. Furthermore, we typeset random variables in san-serif 
font (e.g., Z) and deterministic realizations in serif font (e.g., Z). 

2 Recall that circular convolution between b and c\ can be written as 
vi = Circ (b)ci, with circulant matrix Circ(6) = Diag(vTVAb) A 
for unitary discrete Fourier transform (DFT) matrix A. The DFT of the 
convolution outputs is then Avi = Diag(v7VA6) Ac[, matching J7}. 

3 Note (D is a special case of 0 with A= Diag (hi) A, where hi 
denotes the zth column of H. 

4 Appendix [Alshows (2) is a special case of 0 with rank-one L and S = 0. 
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with known { <f > m }. This problem is sometimes known as 
matrix compressive sensing (MCS), which has applications 
in, e.g., video surveillance Go), hyperspectral imaging ca, 
quantum state tomography OH, multi-task regression 02, 
and image processing 02- 

4) Another problem of interest is the estimation of matrices 
B and C from a noisy observation of 

Zi = FtBCGi for Z = 0,... , N z , (4) 

with known {Fp Gi} This problem arises, e.g., in spatial- 
spectral data fusion super-resolution, which aims to the 
hyperspectral images captured by N z cameras |fl4]| . In this 
case, the matrix BC models the high-resolution spatial- 
spectral scene of interest: B is a tall positive matrix 
containing material spectra and C is a wide positive 
(and often sparse) matrix containing material abundances. 
Then Gi and F / represent the spatial and spectral blur¬ 
ring/downsampling operators associated with the /th cam¬ 
era, which have fast implementations. 

B. Approach 

To solve structured-matrix estimation problems like those 
above, we start with a noiseless model of the form 

N b N c 

z = EE iiz(i,ilc i effiM ’ (5) 

2=0 j= o 

where b 0 = 1/y/Nb, cq = l/\fW c , and z F4) £ K M Vi, j are 

known. Note that the collection defines a tensor of 

size M x (iVb + 1) x (N,.+ 1). We then estimate the parameters 
b = [hi,, bN b } J and c = [c \,..., Cjv c ] T from y, a “noisy” 
observation of 2 . In doing so, we treat b and c as realizations 
of random vectors b and C with independent components, i.e., 

N b N c 

Pb,c(b,c) = YIpcA^)’ ( 6 ) 

4=1 2 = 1 

and we assume that the likelihood function of 2 takes the 

separable form 

M 

Py\z(y I Z ) = J^J_ Py m |z m (V m I *“)' (3) 

m= 1 

Note that our definition of “noisy” is quite broad due to the 
generality of Py m \z m - F° r example, ([7} facilitates both additive 
noise and nonlinear measurement models like those arising 
with, e.g., quantization urn Poisson noise G2, and phase 
retrieval ED. Note also that, since bo and c u are known, the 
model 0 includes bilinear, linear, and constant terms, i.e., 

N b N c N b N c 

z = h i zii ’ j)c o + co £ M (i ' 0) + b ° £ z( °' j) Cj 

2=1 j = l 2=1 j= 1 

+ c 0 6 0 z (0 ’ 0) . (8) 

In Section ITVl we demonstrate how 0-0 can be instantiated 
to solve various structured-matrix estimation problems. 

Our estimation algorithm is based on the AMP framework 
02- Previously, AMP was applied to the generalized linear 


problem: “estimate i.i.d. X from y, a noisy realization of 
Z = AXf leading to the G-AMP algorithm JT9), and the 
generalized bilinear problem: “estimate i.i.d. A and X from 
Y, a noisy realization of Z = AX, ' leading to the BiG- 
AMP algorithm ]20l - ]22 . In this paper, we apply AMP to 
estimate b and C from a noisy measurement of the parametric 
bilinear output Z = A(b)X (c), where A(-) and X(-) are 
matrix-valued affine linear functions. We write the relationship 
between b, c, and z = vec(Z) more concisely as (0 and coin 
the resulting algorithm “ Parametric BiG-AMP ” (P-BiG-AMP). 

We also show that, using an expectation-maximization (EM) 
ll23l approach similar to those used in other AMP-based works 
|22-[|26], we can generalize our approach to the case where 
the parameters governing the distributions pt,. , p Cj , and p y | Zm 
are unknown. 

C. Relation to Previous Work 

We now describe related literature, starting with versions of 
compressive sensing (CS) under sensing-matrix uncertainty. 

Consider first the problem of single measurement vector 
(SMV) CS with unstructured matrix uncertainty, i.e., recov¬ 
ering the sparse vector c from a noisy observation of z = 
(A + B)c, where A is known and the elements of B are small 
i.i.d. perturbations ED- AMP based approaches to minimum 
mean-squared error (MMSE) estimation were proposed in 
[|28l . | [29| . The extension to the multiple measurement vector 
(MMV) case, Z = (A + B)C, eliminates the need for B to 
be small and yields the DL problem discussed in Section II-AI 
For the latter, AMP-based algorithms were proposed in fZTl . 
( 221 . The proposed P-BiG-AMP generalizes this line of work. 

Next consider MMV multiple measurement vector (MMV) 
CS with output gain uncertainty, i.e., recovering C with sparse 
columns from a noisy observation of Z = Diag (b)AC, where 
A is known and b is unknown. For the case of positive b 
and no noise, ll30l proposed a convex approach based on 
minimization, which was generalized to arbitrary b in [311. For 
MMSE estimation in the noisy case, a G-AMP-based approach 
to the MMV version was proposed in [f32l . and G-AMP 
approaches to the single measurement vector (SMV) version 
with coded-symbol b and constant-modulus b were proposed 
in 132 and lfl7l . Our proposed P-BiG-AMP approach handles 
more general forms of matrix uncertainty than 02, 02, E2- 

MMV CS with input gain uncertainty, i.e., recovering 
possibly-sparse C from a noisy observation of Z = 
ADiag(6)C, where A is known and b is unknown, was con¬ 
sidered in |[34l . There, G-AMP estimation of C was alternated 
with EM estimation of b using the EM-AMP framework from 
11261 . As such, 1341 does not support a prior on b. 

A related problem is SMV CS with subspace-structured 
output gain uncertainty, i.e., recovering sparse c from a noisy 
observation of z = I)iag(T/ b)Ac with known A, H. This 
problem is perhaps better known as blind deconvolution of 
sequences b , c when H. A are DFT matrices and z is the 
DFT-domain noiseless measurement vector. Several convex 
approaches to blind deconvolution have been proposed using 
the “lifting” technique, which transforms the problem to that 
of recovering a rank-1 matrix L from a (noisy) observation 
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of z m = tr{$^l/} for m = 1 For example, f35l 

proposed a convex relaxation that applies to linear convolution 
with sparse c, |36l proposed a convex relaxation (with guaran¬ 
tees) that applies to circular convolution with non-sparse b. c, 
]8l proposed a convex relaxation (with guarantees) that applies 
to circular convolution with sparse c, and It37il proposed 
alternating and greedy schemes for sparse 6, c. Meanwhile, 
identifiability conditions were studied in GD-IID- 

For ©, i.e., CS with general matrix uncertainty, a pro¬ 
posed an alternating minimization scheme and m showed 
that the problem can be convexified via lifting and then used 
that insight to study identifiability issues. 

Finally, consider the matrix CS problem given by ©. For 
generiqj { T* m }, greedy schemes were proposed in liTOl and 
m and convex ones in un-Ha, mi- 

The P-BiG-AMP approach that we propose in this work 
supports all of the above matrix-uncertain CS, blind deconvo¬ 
lution, and low-rank-plus-sparse recovery models. Moreover, it 
allows arbitrary priors on bi and c 7 , allowing the exploitation 
of (approximate) sparsity, constant-modulus structure, finite- 
alphabet structure, etc. Furthermore, it allows a generic like¬ 
lihood function of the form (J7J, allowing non-linear measure¬ 
ment models like quantization, Poisson noise, phase-retrieval, 
etc. Although it is non-convex and comes with no performance 
guarantees, it attacks the MMSE problem directly, and the 
empirical results in Section [V] suggest that it offers better MSE 
recovery performance than recent convex relaxations while 
being computationally competitive (if not faster). 


D. Organization and Notation 

The remainder of this manuscript is organized as follows. 
In Section [IT] we present preliminary material on belief propa¬ 
gation and AMP, and in Section [III] we derive our P-BiG-AMP 
algorithm. In Section [TV] we show how the implementation of 
P-BiG-AMP can be simplified for several problems of interest, 
and in Section [V] we present the results of several numerical 
experiments. In Section IvTl we conclude. 

Notation : For random variable X, we use p x {x) for the pdf, 
E{x} for the mean, and var{x} for the variance. Af(x; x : v x ) 
denotes the Gaussian pdf with mean x and variance v x . For a 
matrix X, we use xi = [X] :i ; to denote the I th column, x n i = 
[X] n i to denote the entry in the n th row and I th column, X J 
the transpose, X* the conjugate, X H the conjugate transpose, 
|| X ||f the Frobenius norm, and ||X||* the nuclear norm. For 
vectors x, we use x n = [x\ n to denote the n th entry and 
IMIp = (E n \x n \ p ) 1,p to denote the ( p norm. Diag(a:) is 
the diagonal matrix with diagonal elements x, Conv(a;) is the 
convolution matrix with first column x, and Circ(ai) is the 
circular convolution matrix with first column x. 


5 For the special case where each 'I*has a single unit-valued entry (i.e., 
noisy elements of L + S are directly observed), many more schemes have 
been proposed (e.g., (3]l, 10), 1431 ). including AMP-based schemes 120 1 f22 . 



Fig. 1. The factor graph for parametric generalized bilinear inference under 
Ni = 2, N c = 3, and M = 4. 

II. Preliminaries 

A. Bayesian Inference 

For the model defined by ©-(O, the posterior pdf is 

Pb,c\y(b,c\y) = p y \ b . c (y \ b, c) p b (b) p c {c)/p y (y) (9) 

ocpy|z(y \z(b,c))p b (b)p c (c) (10) 

= (n^Vml Z rn(y™ \ Z m (6, c)) ) ( P bi 0*)) ( P Cj (Cj)) , 

m i j 

(ii) 

where I© used Bayes’ rule and ex denotes equality up to a 
scale factor. This pdf can be represented using the bipartite 
factor graph shown in Fig. Q] There, the factors in £EB are 
represented by “factor nodes” appearing as black boxes and the 
random variables in (ITTb are represented by “variable nodes” 
appearing as white circles. Note that the observed data {y m } 
are treated as parameters of the Py m |z m (2/m| - ) factor nodes, 
and not as random variables. Although Fig. [T] shows an edge 
between every bi an d Py m |z m node pair, the edge will vanish 
when z m (b , C) does not depend on b,, and similar for C } . 

B. Loopy Belief Propagation 

Our goal is to compute minimum mean-squared error 
(MMSE) estimates of b and c, i.e., the means of the marginal 
posteriors Pt>i|y(' I v) an d Pcj|y(' I v)- Since exact computation 
is intractable in our problem (see below), we consider approx¬ 
imate computation using loopy belief propagation (LBP). 

In LBP, beliefs about the random variables (in the form of 
pdfs or log pdfs) are propagated among the nodes of the factor 
graph until they converge. The standard way to compute these 
beliefs, known as the sum-product algorithm (SPA) HD, E3, 
says that the belief emitted by a variable node along a given 
edge of the graph is computed as the product of the incoming 
beliefs from all other edges, whereas the belief emitted by a 
factor node along a given edge is computed as the integral 
of the product of the factor associated with that node and 
the incoming beliefs on all other edges. The product of all 
beliefs impinging on a given variable node yields the posterior 
pdf for that variable. In cases where the factor graph has no 
loops, exact marginal posteriors result from two (i.e., forward 
and backward) passes of the SPA g6), (47). For loopy factor 
graphs like ours, exact inference is in general NP hard ll48l 
and so LBP does not guarantee correct posteriors. However, 
it often gives good approximations (49) . 
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C. Sum-Product Algorithm 

We formulate the SPA using the messages and log-posteriors 
specified in Table Q] All take the form of log-pdfs with 
arbitrary constant offsets, which can be converted to pdfs 
via exponentiation and scaling. For example, the message 
f,.)) corresponds to the pdf ^ exp(A^ l _ >i (f,.)) with 

C = J bi e M A Yi(^ b i))- 

Applying the SPA to the factor graph in Fig. [Q we arrive 
at the following update rules for the four messages in Tabled] 

A bi) = log / Py |z m (y m I Z m (b, c)) 

N c 

x n exp (A^ l< _ r (f, b r )) n exp Cfc)) 

r^Li k —1 


+ const 


A c - 

m—tj 


= log / 


{M^l.{<*}**? 


( 12 ) 

Py m |z m ( Um | Z m (b, C)) 


N b 


X 


n«P ( A m«-r(W) exp (A c m ^ k (t, Cfc)) 

fc+7 


r—1 

+ const (13) 

= logp bi (&i) + Y A ^(f,6i) + const 


r^m 


(14) 

A^-(f+l, Cj) = logp Ci (Cj) + Y A r^j(t> C j) + COnSt i 

r^m 

(15) 

where const denotes a constant (w.r.t 6, in (IT2l) and d 
and w.r.t Cj in (IT3l) and (IT5l) ). In the sequel, we denote the 
mean and variance of the pdf A e xp(A^.(f,.) by b m ^(t) 
and ft), respectively, and we denote the mean and variance 
of A exp(A^^ -(f,.)) by c mJ (f) and We refer to the 

vectors of these statistics for a given m as b m (t), v b m (t) € 
and c m (f), iz^(f) £ For the log-posteriors, the SPA 

implies 

A i (t+1, hi) = logp bi (6») + 5^ A^ l _ >i (f, bi) + const (16) 

m 

A®(f+l,Cj) = logp Cj (cj) + 51 + const (17) 


and we denote the mean and variance of + exp(A^(f,.)) by 
bi{t) and v b (t), and the mean and variance of ^ exp(A°(i,.)) 
by Cj(t) and z/ ; (/). Finally, we denote the vectors of these 
statistics as b(t),u b (t) £ R Nb and c(t),v c (t ) G R ffc . 


Z). Approximate Message Passing 

When the priors and/or likelihood are generic, as in our 
case, exact representation of the SPA messages becomes diffi¬ 
cult, motivating SPA approximations. One such approximation 
technique, known as approximate message passing (AMP) 
ESI, becomes applicable when the statistical model involves 
multiplication of the unknown vectors with large random 
matrices. In this case, central-limit-theorem (CLT) and Taylor- 
series arguments can be used to arrive at a tractable SPA 


A5Ui(t,.) 

SPA message from node Py m |z m to node bj 

A m<-»(*> ') 

SPA message from node to node Py m | Zm 

A° -It,.) 

SPA message from node Py m | Zm to node Cj 

A^(L-) 

SPA message from node Cj to node Py m |z m 


SPA-approximated log posterior pdf of b^ 


SPA-approximated log posterior pdf of Cj 

bm,i(t) and v^ft) 

mean and variance of ^ exp(A^ l ^_ i (f,.)) 

and 

mean and variance of exp(A•(£,.)) 

bi(t ) and v\(t) 

mean and variance of ^ exp(A^(f,.)) 

Cj(t) and Vj(t) 

mean and variance of exp(A^(f,.)) 


TABLE I 

SPA MESSAGE DEFINITIONS AT ITERATION t S Z. 


approximation that can be rigorously analyzed 1501 . In the 
sequel, we propose an AMP-based approximation of the SPA 
in Section Hl-CI 


III. Parametric BiG-AMP 
We now derive the proposed AMP-based approximation of 
the SPA algorithm from Section III-CI which we refer to as 
parametric bilinear generalized AMP (P-BiG-AMP). 

A. Randomization and Large-System Limit 

(i l) 

For the derivation of P-BiG-AMP, we treat z/f as re¬ 
alizations of i.i.d. zero-mean unit-variance Gaussian random 
variables and we treat , b,. Cj as independent for all 
Furthermore, we consider a large-system limit (LSL) 
where M, Nb, N c —» oo such that Nb/M and N c /M converge 
to fixed positive constants. Without loss of generality (w.l.o.g.) 
we will assume that E{b(} and E{c|} scale as 0(1/M). 
Given these assumptions, it is straightforward to show from 
(0 that E{z^} scales as 0(1) (see Appendix [Bli 
To derive P-BiG-AMP, we will examine the SPA updates 
C3 - fl3 an£ l drop those terms that vanish in the LSL, i.e., as 
M oo. In doing so, we will assume that the previously 
assumed scalings on Z m , b, , Cj hold whether the random 
variables are distributed according to the priors, the SPA 
message pdfs E3-C3- or the SPA-approximated posterior 
pdfs (fl~6] >- (fT7l) . These assumptions lead straightforwardly to the 
scalings of z m (t), v^(t), b mji (t), v b m i {t), c m j(t), and v^it) 
specified in Table 03] Furthermore, we will assume that both 
b m ,i(t) — bi(t) and c m j(t) — Cj(t) are 0(1/M), which leads to 
the assumed scalings on the variance differences in Table 011 
Notice that, since bft) = 0(1/ y/M) and Cj(t) = 0(1/y/M), 
the difference quantities (b m ,i(t) — bi(t)) and ( c m ,j(t) — Cj(t )) 
scale as 1/y/M times the reference quantities bi(t) and Cj(t), 
as in previous AMP derivations (e.g., ED-Gol). Other entries 
in Table Oil will be explained in the sequel. 

B. SPA message from node p y j Zm to node bj 

We begin by approximating the message defined in dTH i. 
First, we invoke the LSL to apply the central limit theorem 
(CLT) to Z m = z rn (b. c), where b and C are distributed 
according to the pdfs in (f]~2l i. (Details on the application 
of the CLT are given in Appendix |C]) With the CLT, we 
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bm,i (t) 

0{ m1/2 ) 

"m.iW 

O(w) 


°(£) 

c m,j (i) 

C( M l/2 ) 


O(^) 

C-m,j (^) Cj (t) 

O(-jj’) 

Pm(t) 

0(1) 

^m(t) 

0(1) 


0( M 3/2) 

Zm (t) 

0(1) 

>4(‘) 

0(1) 

*£,,■(*) - 

0( M 3/2) 

$m (t) 

0(1) 

^(*) 

0(1) 

4,i (*) - "?(*) 

0 (s*) 

Z <&>(t) 


Am ( t ) 



°(l^) 

0(1) 

0(1) 

2^ } (t) -2&’ j) (t) 

0(jpi7i) 

) 

0(1) 

Am (t) 

0(1) 

Am(t) ~A*\t) 

0( M !/ 2 ) 

r m,j (t) 

0( M l/2 ) 

v mj( f ) 

0(&) 

- ?j{t) 

O(jg’) 

Qm,i (t) 

< 9( M i/2 ) 

<,((*) 

O(ir) 

Qm,i{t) ~ Qi(t) 

O(jg-) 


TABLE II 

P-BlG-AMP VARIABLE SCALINGS IN THE LARGE-SYSTEM LIMIT. 


integral to a one-dimensional integral: 

^m_>i(^> bi) ~ 1°§ / Py m |z m (Vm \ z m) 

■J Zm 

X N(z m \ E{z m | b i = b / } . Yil l'{ Z„, | bj = 6/}) (23) 

= -ffm( + 6- ^ y m,i( t )4 j)2 


AT C 


2 = 1 

const. 


(24) 


where we have introduced the shorthand notation 


can treat Z m conditioned on b, = bi as Gaussian and 
thus completely characterize it by a (conditional) mean and 
variance. In particular, the conditional mean is 


| bi — bi} 


H m (q,v q ) =log f py mlZm (y m \z)Af(z-,q,u q ). (25) 

J Z 

We now further approximate (l24l >. For this, we first intro- 


= E i H b k°j Z t j) + (bi - bi) ^ C jZ£ 

k,j 3 


k,j 


(*>j) 


= &(*) 

' - - -' 

and it can be shown (see Appendix o that the conditional 
variance is 

N c 

var{z m | bi = bi} = uf m (t) + b ■ ^ (21) 

2 = 1 
N c 

+2 bi (t)z£ j) -b mii (t)z^ a ), 

2 = 1 

for z ( *£\t) = Y,kbm,k(t)zm' 3) and 

N r _ 



duce /-invariant versions of pi.m 

T) and 

[ 08) 

FmW = ^l(f) 




iv„ 

+E 


2 = 1 

fc=i 

= ^(t) 



09) 


5 

(20) 

2 =1 



(26) 

(27) 


noting that 

Pm,i(t) = p m {t) -b mt i(t)z^(t) 


(28) 


Vlm(t)=<(t)-V b m,i(t) 


N c 


^(f+E^w ^ 2 
2 = 1 


N c 


+ E *4,2^) \ bm,i{t) 2z{ m )2 ~ 2 bmAt^mitfzfr 

2=1 


(29) 


As with Pi, m (t) and the quantities p rn (t) and v^ n (t) 

are 0(1). Next, we define 


E ^w 2 +E ^(t)^ 2 

k^i \ 2 = 1 / 

j 

(30) 

+ E <2^ (^ } W 2 + bruAtf^ 2 

n — 1 

= J2^A)z { A j) 

(31) 

J — - 1 - 

-26 m , i (f)^ ) W^ ) )- (22) 


(32) 


*>2 


We note that Pi,m(t) and vf m (t) are analogous to the similarly 
named terms in G-AMP |fl9l and BiG-AMP ||20| . Since they 
pertain to estimates of Z m , they scale as 0(1). 

The Gaussian approximation of Z rn \^ i=bi (with mean and 
variance above) can now be used to simplify the representation 
of the SPA message (ITU from an (Nj, + N c — 1)-dimensional 


which are versions of Aim (t), IzCm (f), z(Cm‘ > (t) evaluated at 
6(f) and c(f), the means of the SPA-approximated posteriors, 
rather than at b m (t) and c m (t), the means of the SPA 
messages. As such, the quantities in (I30l>-(l32l> are also 0(1). 
Note that Zm*\t) , zt’ J ' can also be interpreted as 
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as partial derivatives: 


^*\t) = 

2ft %) = 


z (ij) = 


wM b ’ c) 


^lb,C) 


b = 6(f), c = c(t) 


d 2 


dbidcj 


b = b(t),c = c(t ) 
z m (b,c) 

b = b(t), c = c(t) 


(33) 

(34) 

(35) 


Comparing (l30l to ( 11 9[ i and invoking the independence of 
{Cj}, it follows that (z^m (!) — z$h’*\t)) is O^l/M 1 ^ 2 ). Simi¬ 
larly it can be shown that (z^m (!) —2ft’^(£)) is 0(1/M 1 / 2 ). 
With these new quantities, it can be shown (see Appendix [E} 
that (l24l can be expressed as 


A YSA) = const 

+ H m + (pi -&»(!)) 2^ (!) + 0(1/M), (36) 

N c 

<(t) + (b i -b i (t)) 2 '£^A t ) z ^ j)2 

3=1 

N c \ 

+ 2 (b,-Ht)) E + 0(1/M) . 

3=1 / 


The next step is to perform a Taylor series expansion of 
(l36l > in bi about 6 t (f). By carefully analyzing the scaling of 
all terms in the expansion, and neglecting those that vanish as 
M —> oo, it can be shown (see Appendix 10 that 


A YittA) 


const 


(37) 


s m (t)^\t)+u s m (t%(t)^*\t) 2 


+ (s 2 m(t) - "mi*)) Y U j( t ) Z m j) (z™ J) ( f ) -bi(t)Zm j) ) 


- (s 2 m (t) - w m (t))Y^m^ )2 


bl 


using the definitions 


S m (t) = H' m (^ m (t),Ki(t)) (38) 

4 ~ H m (Pm(t), <(*)), (39) 


where H' m (-, •) and //" (■, •) respectively denote the first and 
second derivative w.r.t. the first argument of H m (-,-). Note 
that, since (l37l > is quadratic, the (exponentiated) message from 
p y | Zm to bj is Gaussian in the LSL. Finally, since the 
function H m (-,-) and its partials are 0(1), we conclude that 
s m (f) and v s rn (t) are 0(1) as well. 

Furthermore, the derivation in lf20l App. A] shows that (1381 - 
© can be rewritten as 


S m (t) = (2m(f) -?ro(f))/^(f) (40) 

C(*) = C 1 - ^m(*)/<W) /<(<), (41) 


using the conditional mean and variance 

z m (t) = E{z m | p m =p m (t); &&(!)} (42) 

- v ar{z m | P m =p m (f); <(()}, • (43) 

Note (I42l >- (l43l > are computed according to the pdf 

Pz m \<p m ( z m | Pm(t)', l'm(^)) 

'C^rn^rrkym \ Z m) N ( Z Tn', Pm(t), ^(f)), (44) 

with O = f z Py m \z m (ym | z)Af(z-p m (t),uP n (!)), which is P- 
BiG-AMP’s iteration-! approximation to the true marginal 
posterior Pz m \y(z m \y). We note that (l44l) can also be inter¬ 
preted as the (exact) posterior pdf for Z m given the likelihood 
Py m \z m (ym\-) from O and the prior Z m ~ A f (p m (t), ^(t)) 
that is implicitly adopted by iteration-! P-BiG-AMP 


C. SPA message from node p y j Zm to node Cj 

Since Z m = )>Eo XEo tbE'E implies a symmetry 
between b, and Cj, the procedure to approximate A^ .(!, •) 
is essentially the same as that to approximate A^^(f,') from 
Section IIII-BI The end result is 


Am_>j (f ! c j) 


const 


(45) 


Sm(f)2^m(f) + ^m(t)Co(t)zt’ 3) (t) 2 


-(s 2 m (t) - V s m (t )) E ^ (*)4E (z^’*\t) - Cj(t)z& 


v s m (t)^\t) 2 - (s 2 m (t) - c(f)) E ^ w ^' )2 


4 


Z). SPA message from node Cj to p y j Zm 

We now turn our attention to approximating the messages 
flowing out of the variable nodes. To start, we plug the 
approximation of A m-*j (t, Cj) from (145b into (fl5l > and find 


Ara<-j(f+l| c j) 

~ const + log (p Cj (cj)Af(cj;? m j(t), v r m ^(t))) (46) 


where 

E (K(t)4*’ j \t) 2 (47) 

r^m 

N b 

-( S 2 r (t ) - V S r (t)) Y ^(f)4 M)2 

i= 1 

r m j(t) = Cj (!) + i4,.,•(!) E ( («?(f) - K(t)) (48) 

rzfLm \ 

N h 

x +^(f)2lE(!) 

2=1 

Since i/^ ■(!) is the reciprocal of a sum of M terms of 
0(1), we conclude that it is 0(1/M). Given this and the 
scalings from Table [Tl] we see that r m j(t) is 0(1/M 1 / 2 ). 
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Since r m j(t) can be interpreted as an estimate of Cj, this 
scaling is anticipated. 

The mean and variance of the pdf associated with the 
Cj) message approximation from (l46l > are 


approximation ( |371 > into (IT4l >. we obtain 

b i ) ~ l°g (PCi( b iW(bi] 

+ const (59) 




(*+!) - ^ J c Pc i (c)Ar( c ]r m j(t),i'Z ntj (t)) (49) v q mi (t) = 


l/ m, 7 ( i + 1 ) 




E (ewsEW 


(60) 


r^m 


N P . 


I'm.jO) *4,7^)) l 50 ) 

with K = J c p Cj (c)Af (c; and where g' c . 

denotes the derivative of i/ c with respect to its first argument. 
The fact that ( l49b and (l50l > are related through a derivative 
was shown in mu. 

Next we develop mean and variance approximations that do 
not depend on the destination node m. For this, we introduce 
m-invariant versions of r m j(t) and v r rn ? (f): 


~(t(t)-K(t))J2"W Z r’ j)2 


7 = 1 


= &i(f) + ^i(f) E ( («r0) - y rW) (61) 

r^m \ 


N c 


= 


E (cw^w 2 

m 

N b 

“(SmW - ^mW) E ) 

*=1 / . 

rj (t) = Cj(t) + v]{t) E ( (s^(f) - C(f)) 
m \ 

IVfc N 


(51) 


The mean and variance of the pdf associated with the 
b i) approximation from d59t are then 

b m ,i{t+ 1) A -L J bpbXbWi^Qm^^m.M) (62) 




I<J b 




(52) 




(63) 


i=l 


where K = f b p b . (b)Af(b; q m>i (t), v^S)) and where g' b . 
denotes the derivative of g^. with respect to the first argument. 
As before, we define the m-invariant quantities 


Comparing (l47li-(l48l> to (l5ll-(l52l> reveals that {v r m3 {t) — 

scales as 0(1/M 2 ) and that r m j(t) = fj{t) — v t( b ) = 


Vj + 0(1/M 3 / 2 ), and thus 


implies 


E (cw^c *) 2 


(64) 




,7'(*+l) 


N c 


= 9cj(rj(t) - v](t)s m {t)z^’ 3 \t) + 0 ( 1 /M 3/2 ), 

u j{t) + 0(1/M 2 )) (53) 

= g Cj (rj(t) - rf(t)s m (t)z£’ j) {t), v r 3 (t)) + 0(1/M 3 / 2 ) 

(54) 

= 9 c j (r j {t),v r j {t)) (55) 

^ (r;(f), ^(f))? ro (i)2< m *^(f) + 0(1/M 3 / 2 ) 

= Cj(t+l) - s m (t)z£’ j \t)v?(t+l) + 0(1/M 3 / 2 ), (56) 

where ( l54t follows by taking Taylor series expansions of (|53 | i 
about the perturbations to the arguments; (l55l > follows by 
taking a Taylor series expansion of (l5-fl > in the first argument 
about the point r}(i); and ( l56l > follows from the definitions 

Cj(t+ 1) = g Cj (rj(t ), Vj(t)) (57) 

Vj(t+ 1) - ^(^g'cjirjit),^^)). (58) 

E. SPA message from node bi to p y | Zm 

Once again, due to symmetry, the derivation for A^ n ^ t (t+ 
1 ,bi) closely parallels that for -(t + 1, Cj). Plugging 


- v s m {t)) E vC j{ t )Zm j)2 


7=1 


Qi(t) = bi(t) + u?(t) E ((si(f) - *4(i)) 


(65) 


iV c 


7 = 1 

and perform several Taylor series expansions, finally dropping 
terms that vanish in the LSL, to obtain 

b m ,i(t+ 1) = bi(t+l) - s m {f)z^*\f)u\{t+ 1) 

+ 0(1/M 3 / 2 ), (66) 

bi(t+ 1) = 5 , b i (g'i(f),^ ? (f)) (67) 

^(i+1) - (68) 

75 Closing the loop 

To complete the derivation of P-BiG-AMP, we use (1561 ) 
and d66l) to eliminate the dependence on m in the b, and Cj 
estimates and on i and j in the Z m estimates. By plugging ( 1561 ) 
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and ( l66l ) into the expression ( 126} for p m (t) and dropping terms 
that vanish in the LSL, it can be shown (see Appendix[G} that 


N c 


( N b 

Pmit) « z£^{t)-a m (t-l) 

(69) 

Although not justified by the LSL, we also approximate 


+ Y^’ i) (t -] ■ 
2 = 1 


Nb N b 

(70) 

2 = 1 2=1 

N c N c 

Y^it-i^mit) «E^’ j) w 2 ^) (71 > 

2=1 2=1 


for the sake of algorithmic simplicity, yielding 

Pm(t) « -s m (t- 1) 

( N b N c 

E^wVw+E^‘ ) ( t )Mw ], 

*=1 2=1 


(72) 


Next, we eliminate the dependence on zCm (t) from rj (t). 
Plugging ( 1751 1 into (152} and dropping the terms that vanish in 
the LSL yields 

G (f) W Cj (t) + V] (t) Y (Sm A) ~ V S m 0)) (77) 

m 

N b 

2—1 m 

X ^^(t) - s m (t- 1) Y ^n’* } (t-j , 

Although not justified by the LSL, we also approximate 


N b 

Y Sm(t)Sm(t- 1) Y U i ( ( )4 J) 2» * ] {t~ !) 

m 2—1 

N b 

(78) 

m i=l 

for the sake of algorithmic simplicity, yielding 

?j(t) « Cj(i) + i/J(t) E (s m {t)z^ j) {t) 

m 

N b 



noting that similar approximations were made for BiG-AMP 
|[20l . where empirical tests showed little effect. Of course, a 
more complicated variant of P-BiG-AMP could be stated using 
(169} instead of d72} . 

Equations (f56t and (166} can also be used to simplify vf r ' n {t). 
For this, we first use the facts u!f nj (t) = v^(t) + 0(1/M 3 / 2 ) 
and 1/^(4) = v\(t) + 0(1/M 3 / 2 ) to write (l27l> as 

N c N b 

vm = E +e w 2 ( 73 ) 

2=1 i=l 

M JV C 

+ EE^(^w^ )2 + 0(1/Af 1/2 ). 

i =1 2=1 

Then we use d56t with ( | 1 9l i and ( 130b to write 

N e 

&{t) =2&* ) (t) 

2=1 

+ 0(1/M), (74) 


and similarly we use (166} to write 


N b 


^\t) = ^\t) 


i =1 


noting that a similar approximation was made for BiG- 
AMP |[20l . The expression (f79l > then simplifies. Using (l30t 
to expand Zm*\t), the last term in ( [79b can be written as 

N b 

m 2=1 

Nb 

= Y V i^> E J/ mW 2 m‘ 3)2 ( 80 ) 

2=1 m 

N b 

+ ^ (*) E (*) E E ^ 

2=1 m 

~ Vj{t)Cj(t) Y Y !/ mW J! m j)2 . (8!) 

2=1 m 

where (|8H holds in the LSL (see Appendix Q}. Thus, ( 179b 
reduces to 

Tj (f) « Cj (t ) + i/J (f) E (*) 

m 

N b 

- v] Y E ( 82 ) 

m 2=1 

Similarly, we substitute (f74b into ([65} and make analogous 
approximations to obtain 


+ 0{1/M). (75) 

Plugging <[74l) - (1731 into (173} and dropping the terms that vanish 
in the LSL yields (see Appendix |H} 

N b N c 

<(*)« p p m (t )+E E (76) 

t= 1 2 = 1 


giW RiiiW + 

m 

Nc 

- ^ {*%(*) Y E (83) 

m 2=1 

Next, we simplify expressions for the variances v r At) and 
(t). First, it can be shown (see Appendix 0} that (140} and 



DECEMBER 14, 2015 


9 


( OTT i can be used to rewrite the second half of v J (t ) from (f5Tb 
as 


N b 

(frntt) ~ CW) 51 ( 84 > 

m 2—1 

_ V [ (Zm -Pm(t )) 2 ) \ 


where the random variable Z m above is distributed according 
to the pdf in d44t . For the G-AMP algorithm, fl9l Sec. VI.D] 
clarifies that, under i.i.d priors and scalar variances, in the 
LSL, the true z rn and the G-AMP iterates p m (t) converge 
empirically to a pair of random variables (z, p) that satisfy 
Pz\p{z\p(t)) = N(z-,p{t),v p {t)). This suggests that (l84l > is 
negligible in the LSL, in which case (fTil implies 

■ ( 85 > 

A similar argument yields 





-i 


( 86 ) 


The final step in the derivation of P-BiG-AMP is to approx¬ 
imate the SPA posterior log-pdfs in (fl~6]> and 03. Plugging 
(EH and (l45l > into these expressions, we get 


Aj?(t+1,&») « const + log(pb i (bi)N{b i \q i (t),v?(t))) (87) 
A C j(t+l, c j) m const + log{p C:i (cj)N'(c j ;r j (t),Vj(t))) (88) 

using steps similar to those used for (l46l i. The corresponding 
pdfs are given as (D2) and (D3) in Table [III] and represent 
P-BiG-AMP’s iteration-f approximations to the true marginal 
posteriors \ y) and p Cj \y(cj \ y). The quantities &*(£+1) 

and v^{t+ 1) are then respectively defined as the mean and vari¬ 
ance of the pdf associated with (l87l >. and Cj(t- 1-1) and i/?(f+l) 
are the mean and variance of the pdf associated with (f88l> . As 
such, bi(t+ 1) represents P-BiG-AMP’s approximation to the 
MMSE estimate of b, and uf(t+l) represents its approximation 
of the corresponding MSE. Likewise, Cj(t + 1) represents P- 
BiG-AMP’s approximation to the MMSE estimate of Cj and 
Vj(t + 1) represents its approximation of the corresponding 
MSE. This completes the derivation of P-BiG-AMP. 


G. Algorithm Summary 

The P-BiG-AMP algorithm is summarized in Table [III] The 
version in Table [III] includes a maximum number of iterations 
Tmax, as we H as a stopping condition (R19) that terminates 
the iterations when the change in falls below a 

user-defined parameter r st op. Noting the complex conjugates 
in (R12) and (R14), the algorithm also allows the use of 
complex-valued quantities, in which case A f in (D1)-(D3) 
would denote a circular complex Gaussian pdf. However, for 
ease of interpretation. Table Hill does not include the important 
damping steps that will be detailed in Section IIII-II 

The complexity scaling of each line in Table [HI] is tabu¬ 
lated in Table [IV] assuming that all MN b N c entries in the 
tensor zin J> are nonzero. In practice, Zm’’' 1 is often sparse or 


definitions: 

Pzm|p m ( 2 Ip;"*) — 

Poj |rj-(<= I f; v r ) = 

PbJqA 6 I V " a ) — 
initialization: 

Vm : s m (0) = 
Vi,j : choose bi 


P Vm I*) 


Avm I z) M(z-,p,vP) 


f z / Py m \z m (ym I z')Jf(z'\p,uP) 
PCj(c) N(c-,r,v r ) 

f c t P Cj (c')Ar(c'-,r,vr) 
P b .{b)Ar(b-,q, v*) 
fb' P bi (b')^(b';q,u<l) 

0 


(Dl) 

(D2) 

(D3) 

(11) 

( 12 ) 


for t = 1,.. 

■ T, 

fiax 




Vm, i : zj 

<,*) 

(i) = 

-yrnAc (i,3)~. 

-2^j =0 z m c 3 

(t) 

(Rl) 

Vm, j : 

».5) 

(t) = 


j) 

(R2) 

Vm : 

*.*) 

(t) = 


‘Vi) or E^oOpW^’^W 

(R3) 

Vm : 


(t) = 


‘■* ) (i)| 2 + Ef = ° 1 r';(i)|2 < m *- 3 ‘ ) (t)| 2 

(R4) 

Vm : 

i/P 

(t) = 

=<w + e5 

1 ^(t)Ef= c 1 ^(t)l4 J) l 2 

(R5) 

Vm : 

Pm 

(t) = 

= 5<J-’ ) (i) - 

(i-lX(i) 

(R6) 

Vm : 


(t) = 

= var{z m I P m = 

Ptn(f);<(f)> 

(R7) 

Vm : 

2 r „ 

(t) = 

= E{z m |p m =p 


(R8) 

Vm : 


(t) = 

= (i 


(R9) 


(RIO) 

(Rll) 


Vm : s m (t) = C? m (i) - p m (t))/^(t) 

Vj : ^it) = ( i ^(t )W | ^ 

Vj : rj (t) =%(t) + ^ (i) E"=i 3m (ty 

- I/J(t)o,(t) Em=l <,(*) zZ\ ^(*)l^’ 3) l 2 (R12) 

Vi : ^(t) = (E" = i^(t)l^'* ) (t)| 2 )" 1 (R13) 

Vi : q t (t) = 6t(i) + uHt) E" = i s m (t) 


?«•*) 


(tr 


- Em=l "mW Ef = T ^(t)l^’ 3) | 2 fRW) 


Vi : Vj(t+ 1) =var{c 3 | Kj (R15) 

Vi :c 3 (t + l) = E{c 3 |r i = r j (*);G r W} (R16) 

Vi : j/jTi + l) =var{bi | q 4 = qi(f); (R17) 

Vi : 6i(t + l) = E{bi |q 4 =5i(t);i/f(t)} (R18) 

if Zm=i |2&>’* ) (i) — — 1)| 2 < T st o P Em=i |3^’* ) (t)| 2 . Stop (R19) 

end 


TABLE III 

The P-BiG-AMP Algorithm 


(Rl) 

0(MN b N c ) 

(R2) 

0(MN b N c ) 

(R3) 

0(M(N b AN c )) 

(R4) 

0(MN b + MN C ) 

(R5) 

0(MN c N b ) 

(R6) 

O(M) 

(R7) 

O(M) 

(R8) 

O(M) 

(R9) 

~0[M) 

(RIO) 

OfMj 

(Rll) 

0(MN c ) 

(R12) 

0(MN b N c ) 

(R13) 

0(MAT b ) 

(R14) 

0(MN b N c ) 

(R15) 

0(N C ) 

(R16) 

0(7V C ) 

(Rl?) 

~0(Nb) 

(R18) 



TABLE IV 

Worst-case complexity of P-BiG-AMP from TableIThI 


implementable using a fast transformation, allowing drastic re¬ 
duction in complexity, as shown in Section HV1 Thus, Table II VI 
should be interpreted as “worst-case” complexity. 


H. Scalar-Variance Approximation 

The P-BiG-AMP algorithm from Table [HI] stores and pro¬ 
cesses variance terms 77^ , v p n , v^ L . tz*,, iA , uf , z/ : , that de¬ 
pend on the indices m,j,i. The use of scalar (i.e., index- 
invariant) variances significantly reduces its complexity. 

To derive scalar-variance P-BiG-AMP, we first assume Vi : 

^(t) « v b (t) ± a «d Vj : « u%t) 4 
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jj- "jit). Then we approximate 77^ (f) as 


N b N ( 


v m{t) ~ 


(89) 

M 

N b c( , N c 

E ii^wii 2 + ^E iE J) wii 2 = n*)- 



(90) 

Similarly, 

v^(t) is approximated as 



N b N c 


<(*) 

^f^ + ^kwEEi^i 2 

(91) 

«F P 

b c Nb N c 

W+ ( t () EE Ill’ll 2 = ^w. 

*= 1 7=1 

(92) 

where i 

XEi | 2 can be pre-computed. 

Even 


with the above scalar-variance approximations, v* n (t) is not 
guaranteed to be m-invariant. Still, it can be approximated as 
such using v s (t) = J2m =1 i n which case 


^W^^WII^WII 2 )" 1 (93) 

« =" r (t) (94) 

M 

rj(t ) = Cj(t) + v r {t) E Sm{t)z^’ j) it)* 

m= 1 

N b 

-^{t)v s {t)v\t)c 3 (t)Y ll^ll 2 , (95) 

2=1 

where |||| 2 can be pre-computed. Similarly, 

(^(OII^WII 2 )" 1 (96) 

«^ s (o^En £(i ’* } wn 2 ) = ( 97 > 

M 

Qi(t) = bi(t) + v q (t) Y Sm(t)z < £* ) (t)* 

ra= 1 

N c 

-v*(t)v a (t)v°(t)bi{t)Y ll^ll 2 . (98) 

7=1 

The scalar-variance P-BiG-AMP algorithm is summarized 
in Table [V] The complexity scaling of each line in Table [V] 
is tabulated in Table [Vi] Like with Table II VI the values in 
Table PvTl should be interpreted as “worst-case.” 


definitions: 

Pzmlp m ( z Ip;*' 1 ') - 

Pcjlrj(clr;v r ) = 

Pbilq^b | ?! v q ) = 


Py m | Zm (l/m I z)M(z-,p,uP) 

L'Py m |z m («r" \*')N(z';p,pP) 

PCj(c) N(c,r,v r ) 

I c i Pc j (=')JV(c'i?,i/'-) 

Pb,-( b ) N(b-,q,v q ) 

V Pb i «>')JV(b' i f,^9) 


(Dl) 

(D2) 

(D3) 


initialization: 

Vm:s m (0)=0 (It) 

Vf, j : choose 6i(l),iz i, (l),cj(l),!z c (l) (12) 

for t — 1, . . . Tmax 

Vi : 2 < *'* ) (t) = Ef= 0 2 ( ’ j) Cj( 4) (Rl) 

Vi : 2<*-«(t) = ESo^W z(<,i) ( R2 > 

or Z:f = c 0 CiW2 ( *^(t) (R3) 
P p (t) = (^W ESi l|2' (i '* ) (0II 2 

+ ^ c P)Ef = c i ||2<*- j) (i)| 2 )/M (R4) 

^(t) = 77*>(t) + ^(tK/t) eS, Ef=T f/M (R5) 

p(f) — (t) — fs(t — 1)T/ P (t) (R6) 

^(t) = Em=i var{z m | p m =p m (t); ^(t)}/M (R7) 

Vm : z m (t) = E{z m | p m = p m (i); y p (t)} (R8) 

yS (t) = (1 — v z (t) / v p (t)) / v p (t) (R9) 

s(t) = (z(t) - p(t))/v p (t) (RIO) 

■' , (t) = (-’(t)E j *i 1 ||2(*^(t)|| 2 /JV 0 ) _1 (Rl 1) 

Vi : fi(t) = (i - eS, lk (i ' 3) ll 2 )Si(t) 

+ i/ r (i)z ( *'3 )H (i)«(t) (R12) 

■' s w«(>' , we2i ii^ i ’* ) (t)ii 2 /jv0~ 1 <R13) 

Vi : *(t) = (l - ^(t). a (t)/(t) Ef=T IN (i ’ 3) H 2 )bi(t) 

+ i>®(t)z (<,,l,)H (t)a(t) (R14) 

t/ °(t+ 1 ) = 12^=1 var{c 3 |r^=fj(i);i/J(t)}/iVe (R15) 

Vi : c,(i+l)=E{c, | G- = rv(i);^J(t)} (R16) 

!Z 6 (4 + 1) =eSi va r{bi |q j =5i(t);vf(i)}/iV I , (R17) 

Vi : bi(4+l)=E{bi |q i= gi(t);i/f(t)} (R18) 

if||2(*’*)(t)-2<*-*>(t-l)|| 2 < r st „p||2 ( *’* ) (4)H 2 ,stop (R19) 


end 


TABLE V 

The Scalar-Variance P-BiG-AMP Algorithm 


(Rl) 

0(MN b N c ) 

(R2) 

0(MN b N c ) 

(R3) 

0(M(N b AN c )) 

(R4) 

0(1) 

(R5) 

"o(T) 

(R6) 

O(M) 

(R7) 

7MM) 

(R8) 

IMM) 

(R9) 

7mm) 

(RIO) 

7MM) 

(Rl l) 

0(1) 

(R12) 

0(MN c ) 

(R13) 

"0(1) 

(R14) 

0(MN b ) 

(R15) 

0{N C ) 

(R16) 


(R17) 

OpVb) 

(R18) 

opvi] 


TABLE VI 

Worst-case complexity of scalar-variance P-BiG-AMP. 


7. Damping 

Damping has been applied to both G-AMP IfSTI and BiG- 
AMP ll20l to prevent divergence. Essentially, damping (or 
“relaxation” in the optimization literature) slows the evolution 
of the algorithm’s state variables. For G-AMP, damping yields 
provable local-convergence guarantees with arbitrary matri¬ 
ces GD while, for BiG-AMP, damping has been shown to 
be very effective through an extensive empirical study ED. 


Motivated by these successes, we adopt a similar damping 
scheme for P-BiG-AMP. In particular, we use the iteration-/, 
damping factor /3(f) £ [0,1] to slow the evolution of certain 
variables, namely, v^, s" m , bi, and cj. To do this, we 
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replace steps (R4), (R5), (R4), and (RIO) in Table HIH with 


/ N b N c 

v p m (t)=m( + 

\ 7 — 1 7 — 1 


b - A J - _L 

+ (! - P(t )!) 

(99) 

/ N b N c 


+EE^(*)^(*)i^' ) (*)i 2 

' ?" — i 1 — i ' 

) 

+ (1 1) 

(100) 

CW = ^)((! - 


+ ( 1 -#)K( t - 1 ) 

(101) 

S m (t) = P(t)(z m {t) -Pm(t))/^(t)) 


+ (i - P{t))s m (t- 1), 

(102) 


and we insert the following lines between (RIO) and (Rll): 


(P-BiG-AMP approximated) posterior and the assumed prior 
on b , and the third term rewards highly likely estimates Z. 

For adaptive damping, we adopt the approach used for both 
G-AMP and BiG-AMP in the public domain GAMPmatlab im¬ 
plementation (53l . In particular, if the current cost J(t) is not 
smaller than the largest cost in the most recent stepWindow 
iterations, then the “step” is declared unsuccessful, the damp¬ 
ing factor /3(f) is reduced by the factor stepDec, and the 
step is attempted again. These attempts continue until either 
the cost criterion decreases or the damping factor reaches 
stepMin, at which point the step is considered successful, 
or the iteration count exceeds T max or the damping factor 
reaches stepTol, at which point the algorithm terminates. 
Otherwise, the step is declared successful, and the damping 
factor is increased by the factor step Inc up to a maximum 
allowed value stepMax. 


bi(t) = p{t)bi(t) + (1 - (3(t))bi(t - 1) (103) 

Cj{t) = (3(t)cj(t) + (1 - P(t))cj(t - 1) (104) 

N c 

= (105) 

3=0 

Nb _ 

(106) 

i =0 


The quantities and ~Zm 3 ' 1 (f) are then used in steps 

(R11)-(R14), but not in (R4)-(R6), in place of the versions 
computed in steps (R1)-(R2). Similarly, the newly created state 
variables 6j(f) and Cj (f) are used only to compute (f) and 
’Zm. ’ 1 ' 1 (t). Note that, when /3(f) = 1, the damping has no effect, 
whereas when /3(f) = 0, all quantities become frozen in f. 
Although these modifications pertain to the full P-BiG-AMP 
algorithm from Table HIH similar damping steps can be applied 
to the scalar-variance version from Table [V] 

1) Adaptive Damping: Because damping slows the conver¬ 
gence of the algorithm, we would like to damp only as much 
as needed to prevent divergence, i.e., to adapt the damping. 
An adaptive damping scheme for G-AMP was described in 
ll52l and a similar one was described for BiG-AMP in l20l . 
Both are based on monitoring an appropriate cost J(f) and 
applying more damping when the cost increases or less when 
the cost is decreasing. The same approach can be used for 
P-BiG-AMP. For example, extending the approach used for 
BiG-AMP l20l would lead to the cost 


J{t) = Y^ D ( Pc i\ r A' Pc 3 (-)) ( 107 ) 

3 

+J2 D {p^\ q 4 (‘ I ^0)) 

i 

~ E z m ~Af(?it'* ) (t);yP (t)) { l°gPy m |z m (ym | Z m )}. 


Meanwhile, the Bethe-free-energy approach used in ll22l . l52l 
offers a more principled, yet more complex, alternative. Intu¬ 
itively, the first term in ( 1 1 07b penalizes the deviation between 
the (P-BiG-AMP approximated) posterior and the assumed 
prior on C, the second penalizes the deviation between the 


J. Tuning of the Prior and Likelihood 

To run P-BiG-AMP, one must specify the priors and like¬ 
lihood in lines (D1)-(D3) of Table lUT] and Table [VI Although 
a reasonable family of distributions may be dictated by the 
application, the specific parameters of the distributions must 
often be tuned in practice. Building on the approach de¬ 
veloped to address this challenge for G-AMP ll25l . which 
was extended successfully to BiG-AMP in (20), we outline 
a methodology that takes a given set of P-BiG-AMP pri¬ 
ors {p b .(-;0),p Cj (-;0),p ym | Zm (// m |-;6>)} Vm ,n,i and tunes the 
vector 6 using an expectation-maximization (EM) l23l based 
approach, with the goal of maximizing its likelihood, i.e., 
finding 9 = argma x 0 p y (y; 9). 

Taking b , C, and Z to be the hidden variables, the EM 
recursion can be written as l23l 


~fc+i 

9 = arg max E 

e 


= arg max 

e 


{log p b ,c,z,y{b,c,z,y-9) y;f^j 
^E{logp bi (b t ;0)|y;^} (108) 

i 

X] E { log Pc, (Cj-,9) y; 9 | 

3 

^E{logp y j Zm (t/ ro |z m ;0) y;0* j 


where for (1 1 08b we used the fact pb ,c,z,y(b, C,Z,y:9) = 
Pb{b; 9)p c (C; 9)p y \ z (y\z ; 9) 1 z - z (b,c) anddie separability of 
Pb , p c , and Py\ z - As can be seen from ( 1 1 08b . knowledge of 
the marginal posteriors {Pb i \yiPc j \yiPz m \y}'ii,j,m is sufficient 
to compute the EM update. Since the exact marginal poste¬ 
riors are too difficult to compute, we employ the iteration-/: 
approximations produced by P-BiG-AMP, i.e.. 


Pb i \y(pi\y) ( 109 ) 

Pcfyicj | y) « p Cj \ tj {cj | ?j (t); v] it)) (110) 

Pz-m\y(~ m I V) ^ Pz m |p m ( z m | Pm(t ); ! (HI) 


for suitably large t, where the distributions above are defined in 
(D1)-(D3) of Table Hill In addition, we adopt the “incremental” 
update strategy from ED, where the maximization over 9 is 
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performed one element at a time while holding the others fixed. 
The remaining details are analogous to the G-AMP case, for 
which we refer the interested reader to ||25l . 

IV. Example Parameterizations 

P-BiG-AMP was summarized and derived in Section [Til] 
for generic parameterizations z l ' %A in ©. A naive imple¬ 
mentation, which treats every EE as nonzero, would lead 
to the worst-case complexities stated in Table [IV] (or Ta¬ 
ble |VT] under the scalar-variance approximation). In prac¬ 
tice, however, {zm } is often sparse or implementable us¬ 
ing a fast transformation, in which case the implementa¬ 
tion can be dramatically simplified. We now describe sev¬ 
eral examples of structured detailing the computa¬ 

tions needed for the essential scalar-variance P-BiG-AMP 
quantities E*=i E& II^WII 2 . 

{E’* )H (W)}E and {E*^ H (W)}f=i- 

A. Multi-snapshot Structure 

With multi-snapshot structure, the noiseless outputs become 

N b 

Z = J2 b i Aiz) C with known {A (i) }, (112) 

2=0 

where Z £ C KxL and C £ C NxL foi@ L > 1. Thus we 
have A A £ C KxN , M = I\L, and N c = NL. Defining 
2 : = vec (Z) and c = vec(C), we find 
N b 

z = Y j } a{Il®A^)c, (113) 

2—0 


with pre-computed 

N b 

r = A (l)H A (i) . (121) 

2=1 

The following quantities can also be pre-computed: 


N b N b 

Di* <,J Ti 2 = Di°!ti >„«» 2 

2=1 2=1 

N c 

Y / \\z^\\ 2 = L\\A^\\ 2 F 

3 =1 

N b N a 

EE \\z^ j) \\ 2 = Ltr(r). 

2=1 j =1 


( 122 ) 

(123) 

(124) 


Furthermore, under the scalar variance approximation, 


R{t) = (1 - v r {t)v s {t)v b {t)D r )C{t) 
+ u r (t)A H (t)S(t ) 

q(t) = (l — v q (t)v s (t)v c {t)D q }b(t) 
vec (A (1) C(f)) H 


+ v q (t) 


s(t), 


vec (A l - JVb ^C(t)) H 


(125) 


(126) 


with the following pre-computed using o? = !irl : 

D r 4 diag { EE\ ll«i°l| 2 , ■ • ■ ■ E^i II^H (127) 
D q 4 Ldiag {||A«|lE ..., ||A^|If} (128) 


which implies that 


Z^ j) = [Ji® 

(114) 

z {l ’*\t) = vec (A^C(t)) 

(115) 

z ( -*' j \t)=[l L ®A{t)\.. 

(116) 

Nb ^ 

£(*’*)(£) = E^(^)ve c (A ( ^C(f)) 

= vec (A(£)C(f)) 


(117) 

N b 

^co = EE*) a E 

2=0 

(118) 

where [X] : j denotes the jth column of X and C(t) £ C NxL 
is a reshaping of c(t). Note that (11 1 4l)-(ITT6t follow directly 
from dl 13|) via the derivative interpretations (l33l>-(l35l>. 

From the above expressions, it can be readily shown that 


N b N b 

E II^WII 2 = EII- A ®c{t)\\ 2 F = tr (rd(t)d(t) H ) 

2=1 2=1 

(119) 

f^\\z^ j \t)\\ 2 = L\\A(t)\\ 2 F (120) 

3 =1 

6 When L = 1. OH) reduces to the general parameterization s. 


Note that ( 1117M128I ) specify the essential quantities needed 
for the implementation of scalar-variance P-BiG-AMP We 
discuss the complexity of these steps for two cases below. 

First, suppose w.l.o.g. that each A^ has N a < KN 
nonzero elements, with possibly different supports among 
{A^}. This implies that A(t) has at most min(A r f,7V a , I\ N) 
nonzero elements. It then follows that (1117b consumes 
mm(NbN a ,KN)L multiplies, (II 1 8b consumes N b N a , (II 1 9b 
consumes Lmin(Nb(N a + K),N 2 ) and (1 120b consumes 
min (NbN a ,KN) multiplies. Furthermore, ( 11251 ) consumes 
w min(NbN a , KN)L multiplies and ( 11261 ) consumes « 
N b L(N a + K). In total, 0(mm(N b N a , KN)L + N b N a L + 
N b KL + Lmin(Nb(N a + K),N 2 )) multiplies are consumed. 
For illustration, suppose that N b N a < KN and N b N a < N 2 . 
Then 0(NL + N b L(N a + K )) multiplies are consumed, in 
contrast to 0(MN b N c ) = 0(KNL 2 Nb) for the general case. 

Now suppose w.l.o.g. that, for a given b , the multiplication 
of A(b) by a vector x can be accomplished implicitly using 
N a multiplies. For example, N a = 0(N log TV) in the case of 
an FFT. Then (11171 ) consumes N a L multiplies, ( 11 191 ) consumes 
KL (using {A^C(t)} computed for q(£)), and (11201 ) can 
be approximated using 0(N a ) multiplies. Furthermore, (11251 ) 
consumes « (TV + N a )L multiplies and (11261 ) consumes « 
N b L(N a + K). In total, 0(L(N + N b N a + N b K)) multiplies 
are consumed, in contrast to 0(MNbN c ) = 0(KNL 2 Nb ) for 
the general case. 
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B. Low-Rank Structure 


and so 


With low-rank signal structure, the noiseless outputs become 


= tr ($" B J C) , to = 1,..., M, (129) 

with known {& m }, where B G C NxK , C G C NxL fot0(V > 
1. Thus we have G C KxL , Nb = NI\ , and 7V C = NL. 
Defining (p rn = vec(«fr m ), b = vec(J3), and c = vec(C), 

= 4" vec(B J C) = 6 T ($* m ® I N )c (130) 

= vec {B<f>* m ) J c (131) 

= vec(C^) T b (132) 

from which the derivative interpretations (l33l )- (l35l > imply 

tr ($^(f) T C(<))- 

tr «B(i) T C(i))_ 
(133) 

vec(S(t)$J) T ' 

T 


z (id) = 


[f>i (g) 7jv]ij 
[$M (g) 




?(*>*) 


(f)= 


vec 


(CV)*i 


H\T 


vec 


(cm 


H ) T 

m) 




w= 


vec(7?(i)<& 


m; 


(134) 

From the above expressions, it can be readily shown that 


N b 

Eii^wii^ 

M 

E ii^w h Hf 

(135a) 

2—1 

m=1 

tr (r 1 C(f) H C(f)) 

(135b) 

N c 

Eh^wii^ 

M 

E \my* m \\ 2 F 

(136a) 

2 = 1 

m= 1 

tr (T 2 B(t) J B(t)*) 

(136b) 

with pre-computed 

M 

Fi — V $ 

L 1 — / y = m ^ Tn , 

M 

Fo A V $ 

A 2 — / y m-*- w 

(137) 


m =1 m=l 

The following quantities can also be pre-computed: 


N b 

Ei 

i -1 
Nc 


ALj) ||2 


||2 


= rr. 


= fr 2 


[ JVJjj 


tJVlii 


(138) 

(139) 

(140) 


Eii^ 

2=1 
IVi, 2V C 

EE II^^H 2 = ATtr(ri) = Wtr(r 2 ). 

*=i i=i 

Furthermore, under the scalar variance approximation, 
r(t) = (1 - v r (t)i' s (t)v b (t)\D\a,gTi ® 7jv])c(f) (141) 

+ v r (t) [vec (B(i)*$i) ,..., vec (B(t)*$ M )] *(t) 
q(t ) = (l - ^ 9 (f)i/ s (t)i/ c (t)[Diagr 2 ® 7jv])S(t) (142) 

+ ^(t) [vec (C(t)**I) , • • •, vec ] s(t) 


7 When N = 1. [[29} reduces to the general parameterization 0. 


R{t) = C(t)(l L - v r (t)v s (t)v b {t)D\a,gTi) 
+ v r (t)B(t)* s m m m ) 


Q(t) = B{t){l K - v q (t)v s (t)v c (t) Diagr 2 ) 

+ v q (t)6(ty . 


\m=l 


(143) 


(144) 


Note that ( 1 1 33b - (l 1 44b specify the essential quantities needed 
for the implementation of scalar-variance P-BiG-AMP. We 
discuss the complexity of these steps below. 

Suppose w.l.o.g. that d> m has N$ < KL nonzero en¬ 
tries, with possibly different supports among {This 
implies that has at most rnin(A"L, MN<p) 

nonzero elements. It then follows that z^*’*\t) from 
(11331 ) consumes NKL + MN$ multiplies, ( 11351 ) con¬ 
sumes ss Nmm{L 2 , M(N ( j > + K)}, and ( 11361 ) consumes ss 
N min{A' 2 , M(N ( / ) +L)}. Furthermore, (11431 ) consumes NL+ 
Nm.m(KL,MN ( f > ) + MN^ multiplies and (11441 ) consumes 
NK + Nmm(KL, MN<p). In total, 0((Vmin(7 2 , M(A^ + 
A')) + N min(A 2 , M + L )) + NKL + MN$) multiplies 
are consumed. For illustration, suppose that N 0 < K,L and 
M < K, L. Then O(NKL) multiplies are consumed, in 
contrast to 0(MNbN c ) = 0(MN 2 KL) in the general case. 


C. Matrix-product Structure 

A special case of (11 121 ) and ( 11291 ) is when 

Z = BC (145) 

which occurs, e.g., in applications such as MC, RPCA, DL, 
and NMF, as discussed in Section II-AI In particular, ( II 121 ) 
reduces to ( 11451 ) when Nb = KN and vec(A* i - > ) = [7] : ,, and 
(11291 ) reduces to (11451 ) when M = KL and vec(*I? m ) = [7] : m . 
It can be verified 0:1 that, under (1145b . P-BiG-AMP reduces 
to BiG-AMP from l20l . 


D. Low-Rank plus Sparse Structure 

Recall (0, the problem of recovering a “low-rank plus 
sparse” matrix. Writing the low-rank component as L = 
B J Ci with B G C NxK , Ci G C NxL , and N < min{A, A}, 
we can invoke ( 11301 ) to get 

Zm = b J ($* n ® In)ci + 4LnC 2 , TO = 1, . . . , M, (146) 

with &o — 1 (recall Section [Mil ), b = vec(B), C\ = vec(C), 
c 2 = vec (S) (recall S was the sparse matrix from 0), and 
e = [ci, cJ] T 

Note that the structure of the first term of (11461 ) can be 
exploited through (1 1 33b - (l 1 34b . as discussed in Section IIV-BI 
Meanwhile, straightforward computational simplifications of 
the second term in (11461 ) result when <p^ is sparse. But care 
must be taken in applying the scalar-variance approximation 
in this case: it may be advantageous to use different scalar 
variances for C\ and C 2 (e.g., J and z/^,^)- 
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P-BiG-AMP EM-P-BiG-AMP 



measurements M measurements M 


Fig. 2. Empirical success rate for noiseless sparse signal recovery under 
the i.i.d. parametric bilinear model ju as a function of the number of 
measurements M and the signal sparsity K. Success rates were averaged 
over 50 independent realizations. Points above the red curve are infeasible 
due to counting bound, as described in the text. 

V. Numerical Experiments 

We now present the results of several numerical experiments 
that test the performance of P-BiG-AMP and EM-P-BiG- 
AMP in various applications. In most cases, we quantify 
recovery performance using NMSE( 6 ) = ||b — &H 2 /H&H 2 an d 
NMSE(c) = ||c— c11§/11c11§■ Matlab code for P-BiG-AMP and 
EM-P-BiG-AMP can be found in l53l . 

A. I.i.d. Gaussian Model 

First, we examine the performance of P-BiG-AMP in the 

(i i) 

case of i.i.d. Gaussian z m , as assumed for its deriva¬ 
tion. In particular, {zm^} were drawn i.i.d. CJ\f(0, 1), b = 
[ 61 ,..., 6 jv 6 ] T were drawn Bernoulli-CA/”(0,1) with sparsity 
rate £ 6 , and c = [ci..... C;vJ T were drawn Bernoulli- 
CJ\f( 0, v c ) with sparsity rate £ c . We then attempted to recover 
b and c from M noiseless measurements of the form © 
under b 0 = 0 and Co = 0. For our experiment, we used 
Nb = N c = 100 and v c = 1 , and we varied both the sparsity 
rate = £ c = K/ 100 and the number of measurements M. 

We tested the performance of both P-BiG-AMP, which 
assumed oracle knowledge of all distributional parameters, 
and EM-P-BiG-AMP, which estimated the parameters G = 
[v c , £ b > £ c ] T as well as the additive white Gaussian noise 
(AWGN) variance^] Figure |2] shows the empirical success rate 
for both algorithms, averaged over 50 independent problem 
realizations, as a function of the sparsity K and the number 
of measurements M. Here, we declare a “success” when both 
NMSE( 6 ) < -60 dB and NMSE(c) < -60 dB. The figure 
shows that both P-BiG-AMP and EM-P-BiG-AMP gave sharp 
phase transitions. Moreover, their phase transitions are very 
close to the counting bound “M > 2 K,” shown by the red 
line in Fig. U 

B. Self Calibration 

We now consider the self calibration problem described in 
Section II-AI In particular, we consider the noiseless single 
measurement vector (SMV) version, where the goal is to 
jointly recover the K'-sparse signal c £ M Nc and calibration 
parameters b £ l * 1 from M noiseless measurements of the 
form 2 = \)\i\g(Hb)Ac. where H and A are known. For our 

s EM-P-BiG-AMP was not told that the measurements were noiseless. 


EM-P-BiG-AMP SparseLift 



signal sparsity K signal sparsity I\ 


Fig. 3. Empirical success rate for noiseless self-calibration as a function of 
the number of calibration parameters N & and the signal sparsity K. Results 
are averaged over 10 independent realizations. 


experiment, we mimic the setup used for [ 8 ] Figure 1], Thus, 
we set N c = 256 and M = 128, we chose H as the first Aj, 
columns of a M-point unitary DFT matrix, and we drew the 
entries of A as i.i.d. Af( 0.1). Furthermore, we drew A-sparse 
c with i.i.d. Af(0,u c ) non-zero elements chosen uniformly at 
random, and we drew b as i.i.d. Af( 0, 1 ). 

We compared the performance of EM-P-BiG-AMP to 
SparseFift |S), a recently proposed convex relaxation, using 
CVX for the implementation. EM-P-BiG-AMP modeled c 
as Bernoulli-A/”(0, v c ) and learned v c , the sparsity rate £, 
and the AWGN variance]! Figure [3 shows empirical success 
rate as a function of signal sparsity K and number of cal¬ 
ibration parameters N\,. As in ( 8 ), we considered NMSE = 
||bc T — 6 c T |||./||6c t |||,, and we declared “success” when 
NMSE < —60 dB. Figure [3] shows that EM-P-BiG-AMP’s 
success region was much larger than SparseFift’s0 although it 
was not close to the counting bound M > N^+K, which lives 
just outside the boundaries of the figure. Still, the shape of EM- 
P-BiG-AMP’s empirical phase-transition suggests successful 
recovery when M > ai(Nb+K) for some ai, in contrast with 
SparseLift’s empirical and theoretical J 8 j success condition of 
M > a 2 Nf,K for some a-i- 


C. Noisy CS with Parametric Matrix Uncertainty 

Next we consider noisy compressive sensing with para¬ 
metric matrix uncertainty, as described in Section II-AI Our 
goal is to recover a single, A-sparse, Ay- length signal c from 
measurements y = (A * 0 - 1 + biA^)c + w £ R M , where 
b = [&i,...,&zvJ T are unknown calibration parameters and 
w is AWGN. For our experiment, N c = 256, K = 10, c 
had i.i.d. A/"(0, v c ) non-zero elements chosen uniformly at 
random with v c = 1 , b was i.i.d. Af(0, v b ) with v b = 1 , 
A ( ' <y> was i.i.d. A/”(0,10), and was i.i.d. A/"(0,1). 

The noise variance i/ w was adjusted to achieve an SNR 

= II y - Hli/IMl! of 40 dB - 

We compared P-BiG-AMP and EM-P-BiG-AMP to i) the 
MMSE oracle that knows c, ii) the MMSE oracle that knows 
b and support(c), and iii) the WSS-TLS approach from |9j, 

9 See footnote [8] 

10 The SparseLift results in Fig. [3] agree with those in (S] Figure 1], 
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via alternating minimization. For WSS-TLS, we used oracle 
knowledge of v w , oracle tuning of the regularization parameter 
A, and code from the authors’ website (with a trivial modifica¬ 
tion to facilitate arbitrary A W). P-BiG-AMP used a Bernoulli- 
Gaussian prior with sparsity rate £ = K/N c and perfect 
knowledge of v c and z/'\ whereas EM-P-BiG-AMP learned 
the statistics [£, u c , v w ] J = 6 from the observed data. Figure[4] 
shows that, for estimation of both b and c, P-BiG-AMP gave 
near-oracle NMSE performance for M/N > 0.2. Meanwhile, 
EM-P-BiG-AMP performed only slightly worse than P-BiG- 
AMP In contrast, the NMSE performance of WSS-TLS was 
about 10 dB worse than P-BiG-AMP, and its “phase transition” 
occurred later, at M/N = 0.3. 

D. Totally Blind Deconvolution 

We now consider recovering an unknown signal e, and 
channel 6, from noisy observations y-i = Zi+Wi of their linear 
convolution {zi} = {b/j * {g}, where Wi ~ i.i.d. J\f{Q,v w ). 
In particular, we consider the case of “totally blind deconvolu¬ 
tion” from ea, where the signal contains zero-valued guard 
intervals of duration N g > Nt, — 1 and period N p > N g , 
guaranteeing identifiability. Recalling the discussion of joint 
channel-symbol estimation in Section II-AI we see that a zero¬ 
valued guard allows the convolution outputs to be organized 
as Z = Conv(b)C, where Conv(b) £ ]^ N P x ( N p~ N g) i s the 
linear convolution matrix with first column b. For our exper¬ 
iment, we used an i.i.d. CAf( 0,1) channel b, and two cases 
of i.i.d. signal c: Gaussian Cj ~ CAf( 0,1) and equiprobable 
QPSK (i.e., Cj £ Also, we used guard period 

N p = 256, guard duration N g = 64, channel length Nt, = 63, 
and L = 3 signal periods. 

We compared P-BiG-AMP to i) the known-symbol and 
known-channel MMSE oracles and ii) the cross-relation (CR) 
method lf56l . which is known to perform close to the 
Cramer-Rao lower bound ll56| . In particular, we used CR 
for blind symbol estimation, then (in the QPSK case) de- 
rotated and quantized the blind symbol estimates, and finally 
performed maximum-likelihood channel estimation assuming 
perfect (quantized) symbols. Figure 0 shows that, with both 


Fig. 5. Channel estimation NMSE (left), Gaussian-symbol estimation NMSE 
(center), and QPSK symbol error rate (right) versus SNR for totally blind 
deconvolution. Results are averaged over 500 independent realizations. 

Gaussian and QPSK symbols, P-BiG-AMP outperformed the 
CR method by about 5 dB in the SNR domain. Moreover, by 
exploiting the QPSK constellation, both methods were able to 
achieve oracle-grade NMSE ( 6 ) at high SNR. 

E. Matrix Compressive Sensing 

Finally, we consider the problem of matrix compressive 
sensing, as described in Section II-AI and further discussed 
in Section IIV-DI Our goal was to jointly recover a low rank 
matrix L = B J C\ £ C 100xl0(:) and a sparse outlier matrix 
S = C 2 £ C 100xl0 ° from M noiseless linear measurements 
of their sum, i.e., {z m }^f =1 in ([3}. For our experiment, 
the sparse outliers were drawn with amplitudes uniformly 
distributed on [— 10 , 10 ] and uniform random phases, similar to 
lfl3l Figure 2], But unlike lfl3l Figure 2], the sensing matrices 
{3> m } were sparse, with K = 50 i.i.d. CA/"(0,1) non-zero 
entries drawn uniformly at random. 

We compare the recovery performance of EM-P-BiG-AMP 
to the convex formulation known as compressive principal 
components pursuit (CPCP) ED, i.e., 

argmin ||i||* + Alls'll! s.t. z m = tr {& J m (L + S)} Vm, 

L.S 

(148) 

which we solved with TFOCS using a continuation scheme. 
In accordance with lfl3l Theorem 2.1], we used A = 1/10 
in ( 11481 ). EM-P-BiG-AMP learned the variance of the entries 
in Ci, the sparsity and non-zero variance of C 2 , and the 
additive AWGN variance El Although EM-P-BiG-AMP was 
given knowledge of the true rank R, we note that an unknown 
rank could be accurately estimated using the scheme proposed 
for BiG-AMP in (20l Sec. V-B2] and tested for the RPCA 
application in l2Tl Sec. III-F2], 

11 See footnote [8] 
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Fig. 6. Empirical success rate for noiseless matrix compressive sensing as a 
function of rank R and outlier sparsity rate £ for M = 5000 (top), M = 8000 
(middle), and M = 10000 (bottom) measurements. The left column shows 
EM-P-BiG-AMP and the right column shows CPCP solved using TFOCS. All 
results are averaged over 10 independent realizations. Points above the red 
curve are infeasible due to the counting bound, as described in the text. 


EM-P-BiG-AMP CPCP via TFOCS 



5 10 15 20 25 30 5 10 15 20 25 30 


rank R rank R 

Fig. 7. log 10 (average runtime), in seconds, for noiseless matrix compressive 
sensing as a function of rank R and outlier sparsity rate £ for M = 10000 
measurements. Runtimes were averaged over 10 successful trials; locations 
(R,£) with any unsuccessful trials are shown in white. 


parametric bilinear form z m = Eto from 

noisy measurements {y m }m— i> where y m and z m are related 
through an arbitrary likelihood function and Zm' JI , bo, Co are 
known. Our approach treats bi and c, as random variables 
and Zm as an i.i.d. Gaussian tensor in order to derive a 
tractable simplification of the sum-product algorithm in the 
large-system limit, generalizing the bilinear AMP algorithms 
in ll2()l . 1221 . We also proposed an EM extension that learns 
the statistical parameters of the priors on bi, Cj, and y m \z m . 
Numerical experiments suggest that our schemes yield signif¬ 
icantly better phase transitions than several recently proposed 
convex and non-convex approaches to self-calibration, blind 
deconvolution, CS under matrix uncertainty, and matrix CS, 
while being competitive (or faster) in runtime. 

VII. Acknowledgement 
The authors thank Yan Shou for help in creating Fig. [5] 


Figure [6] shows the empirical success rate of EM-P-BiG- 
AMP and CPCP versus R (i.e., the rank of L) and £ = K /1 (JO 2 
(i.e., the sparsity rate of S ) for three fixed values of M (i.e., 
the number of measurements). Each point is the average of 10 
independent trials, with success defined as \\L— Z|||,/||i|||, < 
—60 dB. Figure [6] shows that, for the three tested values of 
M, EM-P-BiG-AMP exhibited a sharp phase-transition that 
was significantly better than that of CPCP0 In fact, EM-P- 
BiG-AMP’s phase transition is not far from the counting bound 
M > R(200 — R) + £100 2 , shown by the red curves in Fig. [6] 

Figure [7] shows the corresponding log 10 (average runtime) 
versus rank R and sparsity rate £ at M = 10000 mea¬ 
surements. Runtimes were averaged over 10 successful tri¬ 
als; locations (R,f) with any unsuccessful trials are shown 
in white. The figure shows that EM-P-BiG-AMP’s average 
runtimes were faster TFOCS’s throughout the region that both 
algorithms were successful. The runtimes for other values of 
M (not shown) were similar. 

VI. Conclusion 

We proposed P-BiG-AMP, a scheme to estimate the pa¬ 
rameters b = [bi ,..., frjVi,] 1 and c = [ci,..., cjv c ] t of the 

12 The CPCP results in Fig. [6] are in close agreement with those in cm 
Figure 2], even though the latter correspond to real-valued and dense <l? m . 


Appendix A 

On the relation between © and © 

Here we show that © is a special case of ©. From ©, 


N b 

Zml — ^ ^ Q • 5 ^Nb] 

2=1 A v T 

±b J 

a (1)T 

a {Nb)J 

Cl (149) 

= tr {A mCl b J }, 

— A 

— -c*-m 

050) 

where a^ 1 denotes the mth row of and C; denotes the 

(th column of CG K jYxL . Then defining e; as the (th column 
of II and c = vec(C), we can write 

A m Cl = (ej <g> A m ) c. 

051) 




Plugging (|l5l| into (11501) yields 



z m i =tr{<f> J ml L} 


052) 


with L = cb J , a rank-one matrix. Thus © is equivalent to 
© with rank-one L and S = 0. 
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Appendix B 
Scaling of E{z^} 


From (O we have 



N b N c 

2 ' 

W 

N 

3 “ 

II 

W 

EEbz^'c, 



o 

II 

•«*> 

o 

•11 

> 


N b N c N b N c 


+ (b Z m (c ~ C m (t)) , 


(b b m (t)) Z rn (C C rn ( 1 ) ) 


b-b m (t) 

T 

1 y 

2 ^ m 

b b rn (t') 

c 


1 yX 

.2 ^m 

C-c m (t)_ 


var{z | b, : = b,} = E {z 2 | b, = b z } - E {z | b, = b,} 2 . 


The first term in (1 1 60b can be expanded as 


(153) 


EEEE E ^bibi'CjCfZ^z^’^j (154) 

1— 0 j—0 i'=0 j'=0 
N b N c 

= E E E { b ?i E { C 71 E{z^) 2 } (155) 

2— 0 j =0 

= 0(1) (156) 

since it was assumed that E{Zm^ 2 } = 1, that both E{b^} 

and E{c 2 } scale as 0(1/M), and that both N c /M and N^/M 
scale as 0(1). 

Appendix C 

Central Limit Theorem 
To apply the CLT, we first expand 

N b N c 

z - = EE b iC = b J Z m c (157) 

2—0 j—0 

= -b m (t) J Z m Cm(t) + b m (t) T z m c + b J Z m c m (t) 


E {z 2 | bj = bi} 


= E■ 


= E■ 


E E b ^J z(k ' 3) + b ‘. E C 2 


Ah o) 


k^i j 


061) 

(162) 


E E bfcC .’ 2 


(k,j) 


K L k^i j 

+ 2bi E { Y, E b kCjZ^’ti Y, c'jzM) 


+ bf E. 


k^H j j> 

1 2 ' 

(hj) 


E< 


L 3 


063) 


(158) 


where the matrix Z m is constructed elementwise as [Z m ] t] = 
Zm’ and for ( 1158b we recall that b m (t) is the mean of random 
vector b and c m (t) is the mean of random vector C under the 
distributions in (IlB . Examining the terms in (1 158b . we see 
that the first is an 0(1) constant, while the second and third 
are dense linear combinations of independent random variables 
that also scale as 0(1). As such, the second and third terms 
obey the CLT, each converging in distribution to a Gaussian as 
M — > oo. The last term in ( 1 158b can be written as a quadratic 
form in independent zero-mean random variables: 


We now analyze the three terms in ( 1 1 63b . 

The first term in (1163b can be evaluated as follows. 


(159) 


It is shown in ll57l that, for sufficiently dense Z m , the 
quadratic form in (1158b converges in distribution to a zero- 
mean Gaussian as M —> oo. Thus, in the LSL, Z m equals a 
constant plus three Gaussian random variables, and thus Z m 
is Gaussian. 

Appendix D 

Derivation of Conditional Variance 

In this appendix, we derive the variance expression (l22l >. For 
ease of presentation, we supress the subscript m and iteration 
count t. We begin by writing 


E • 


= E 


EE b ^ c 


iZ 


( k,j) 


k^i j 

E E (( bfc ~E( C J _ Cj) +bk(Cj - Cj) 

2 \ 


k^i j 


+ (b k ~b k )cj +b k c j S j 
j 


(164) 


1 2 


Y^ iK 


E 1 

k^i 


E 


CjZ 


(k,j) 


j L k^i 
2 


E E 


(k,j) 


k^i j 


(165) 


- E E "Mz (kJ)2 +E [** ,j) - *} 2 


( 160 ) 


j 

Y ^ M)2 + 


Yhz^ 

k^i 


(166) 
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The second term in ( 1 1 63b then becomes 


26 * E \ EE bkCjzW Y c 'j z{i,j,) 

V 3 3' 





(167) 


= 26, 


E EE[( b * bk )(Cj Cj ) -|- bf~ (c j Cj ) 


v k^i 


+ (bfc - b k )cj + b k Cj z (kj) Yj Cj> - Cjf)z^’ 3 ' ) 

j' 



(168) 


Continuing, 

2 ^ E {EE b ^i z(M E c > (i ’ j,) 

l k^i 3 j' 

= 2 ^EE^^ (M * (iy 

k^i 3 


+ 26,:EE^E fe ’ j) E 


Cj'Z 


(*>/) 


(169) 


k^i j 


= 2 ^E Y^ z{k,j) -bi 


: (ij) \ z (i,3) u c 


+26 m EE^z (M ) E^ (,y) 

\ k 3 j / 3 ' 


= 2b % Y 2 (iJ) ^ c 

j 

+ 2 bi -b i z< i ’*^z ( - i ’*'>. 

Finally, the third term in ( 1 1 63b becomes 


6 2 E. 


E< 


*(»j) 


= bf E- 


E [(°i ^) +i 


A 4 >3) 


= b*Y,"p m2 +ti 


1 2 


E< 




= 6 2 ^ e ^ (iy2 + t (i ’* )2 j . 


(170) 

(171) 

(172) 

(173) 

(174) 

(175) 


Next, we analyze the second term in ( 11601 ). Using the 
expression for E{z | b, = bi) from ( l20l ). we have 


— E{z | b, = 6,} 2 = — 

= - -b^Y 
- 6 2 ?< i ’*) 2 


-%&*>*))+bi&W 

- 2 b i z {i '* ) (z { *’* ) -%&*’*)) 


(176) 


(177) 


Y^* ] 

_ k^i 


- 26 i £ (i ’* ) (z ( *'* ) -bi^) - 6 2 2 (i ’* )2 . 

(178) 


Finally, from (11601) and (1163I ). we know that var{z | bi = 6i} 
equals the sum of (1 1 66b . (117 lb . ( 1175b . and ( 11781 ). Adding them 
together and dropping the terms that cancel, we find that 


var{z | bi = 6i} 

= EE v b k vp ^ k ’ j)2 + Y v j [ y * J) - M (fc j) ] 2 

k^i j j 

+ Y 4^ k ’* )2 + 2 bi Y * ,j) - biZ (iJ) ) 

k^i 3 

+6?E^ W)2 - (179) 

3 


The sum of the first three terms in ( 11791 ) can then be rearranged 
to form 


-E‘ 


k^ti L j 


Y v C jZ {kj)2 + z < fe ’* )2 


+ Y Vj[&* ,i)2 - 26 $*<3)z&j) +^-( fc 4)2 


. (180) 
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Appendix E 
Derivation of (1361) 


and 


In this appendix we derive equation ( I36l >. Using ( 1261 ) and 
we write the H m (-) term in (f24b as 


/ n c 

H m v? m (t) + 6 2 E 

V .7=1 

N c \ 

+ 26 iE^(*) -& mii (t)^' )2 ] 

9=1 / 


= H m yj m (t) + {bi -bm ti (t))z^$(t), 

Nc 

"m(t) + {bi -b m ,i{t )) 2 


(181) 


9=1 


N c 
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i=i 

v c 1 \ 

^ ) W 2 + E<9W^ )2 

2=1 J / 

= Hm ^m(i) + (6i - 6i(*))^ } (*) + 0(1/M), 

JV C 

+ {bi-hit )) 2 E t 'm,j{ t ) z m :,)2 


(182) 


9 = 1 


N c 


+ 2 {bi -% (t)) E + 0 (1/M) . 

9 = 1 / 


s 2 h 




-22 = 2<i^(,i) 2 H" + 2 {b l -b l (t)) E 

* V 2 = 1 

V c \ 

2=1 / 


2=1 


+ 


N c 


2{b i -b i (t))Y< j {t)tt j]2 


9 = 1 


1V C 


‘ 2 Y< i {^ j) {t)^ j) 

9=1 


V c 


^{t)H' m + 2{bi — bi(t)) E W4t’ i)2 

V 2=1 

1V C \ 

+ 2E^(^m J) W^ J) p. 

9 = 1 / 


085) 


Appendix F 

Taylor Series Expansion 

In this appendix, we perform a Taylor series expansion of 
( l36t and analyze the result in the LSL to obtain ( |371 >. 

We start by calculating the first two derivatives of the H m (-) 
term from ( l36l > w.r.t. bi. From ( l36l >. we find that 


which implies 


dH m 

dbi 


N c 


= t)H' m + 2(6, -bi(t)) E ^,iW4’ i)2 
V 2 = 1 


N c 


+ 2 E^'W^W^ Urn, 

9=1 / 


(183) 


d 2 H n 


db 2 


bi—bi ( t ) 


where H' m denotes the derivative of //„, (■, ■) w.r.t. the first 
argument and H m denotes the derivative w.r.t. the second 
argument, supressing their arguments for brevity. Equation _ ~(j ,*)/,\2 tth 
(11831 ) then implies " 


Oil,, 


dbi 


Nc 


y 2=i 


{t)H m 
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(184) 
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The Taylor series expansion of (f24~b can then be stated as 


absorbing -invariant terms into the const, we obtain ( T37t : 


A m->i(Mi) 

~ const + H m (p m (t) + 0(1/M), i/p (0 + 0(1/M)) 


+ ( bi-bi{t )) 


?(*.*) 


(0 


x + 0(1/M), i£(t) + 0(1/M)) 


JV C 


X H m (pm{t) + 0(l/M),v 1 / n (t) + 0(1/M)) 


+ ^{bi -bi(t)) 


x 77" (p m (t) + 0(1/M), <(f) + 0(1/M)) 

+ (2 ) 


i=i 


A ^i{t,bi) 
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N c 


?(*’*) (7\ 2 


- Z'm(O) E U jV) Z ™ j) (^t’ j) (0 -bi{t)z% j 


3 =1 


Afc 


^ («m(0 - ^m(O) E 

i=i 


6f, 


via the definitions of s" m (f) and v/ n (t) from (I38l) -(l39b and the 
following relationship established in 


(191) 




Appendix G 
Derivation of 


x H m (p m (t) + 0(1/M),i^(f) + 0(1/M)) 

J In this appendix, we show how ( 1691 ) results in the LSL. 

+ 0(1/M 3 ), (187) From (l26t and ( fl9l . we have 


where the second and fourth terms in (11861 ) were absorbed 
into the 0(1/M 3 / 2 ) term in (1187b using the facts that 


N c 


[4 E v m,j (02^' ' J) (0 -m J) J (0 

/ N c \ 


2 E<fW 


,(*> J ')2 


7=1 


N c 


^ i=i 


0(1/M 1/2 ) (188) 
0(1) (189) 

0(1/117). (190) 


which follow from the 0(1/A7) scaling of -(f), as well as 
from the facts that (6* — 6j(f)) is 0(1/A7) and the function 
77 m and its partials are 0(1). 

Note that the second-order expansion term in (1187b is 
0(1/M). We will now approximate (1187b by dropping terms 
that vanish relative to the latter as M —> oo. First, we replace 
Z-ym (£) with Zrn *' 1 (t) in the quadratic term in (1187b . since 
— Zm* {t)) is O^/M 1 / 2 ), which gets reduced to 
0(1/A7 3 / 2 ) via scaling by (h, — bi(t)) 2 . Note that we cannot 
make a similar replacement in the linear term in (1187b , because 
the (bi — bi(t )) scaling is not enough to render the difference 
negligible. Next, we replace v/ n j (t) with i/J(f) throughout 
(fieri) . since the difference is 0(1/M 3 / 2 ). Finally, as es¬ 
tablished in 1201 . the 0(1/M) perturbations inside the H m 
derivatives can be dropped because they have an 0(1/M 3 / 2 ) 
effect on the overall message. With these approximations, and 


Pm(t) = E bm,k{t)Cmj{t)z^' j \ (192) 

k,j 


Plugging ( l56t and ( l66b into the previous equation gives 
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= 2<*'*)(t)-fl m (t-l)^ v i( t )^’* ) ^-l)^ , * ) (t) 

+ E W ) + 0 (1/M). (195) 

3 / 

since the second-to-last term in (1194b is 0(1/M). Because the 
first two terms in (1195b are 0(1), the 0(1/M) term in (1195b 
vanishes in the LSL, resulting in (l69b . 
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Appendix H 
Derivation of (1761) 

In this appendix, we derive (f76t . Plugging (l74l > and (f75l) 
into (E3 gives 

"m(t) 

N c / N b y 

U v ti ) 
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j[E{z.[E 2 }] 2 if (*,to) ^ (*',to') 

2 = 0(1), (202) 

where in (1201 b we used the fact that E{z 4 } = 3[E{z 2 }] 2 for 
2 Gaussian Z. Meanwhile, the mean-squared value of the second 
term in (l80l > can be shown to be 

N b M 
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(198) 


where in the last step we retained only the 0(1) terms, since 
the others vanish in the LSL. 


Appendix I 
Derivation of (f8Tb 

In this appendix, we derive ( |8H . Treating Zm JI as i.i.d. 
zero-mean unit-variance Gaussian, the mean-squared value of 
the first term in (l80l > is (suppressing the SPA iteration t for 
brevity) 
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2—1 m= 1 


0 , 2)2 
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= WEEE £(4) 2 (o 2 e {z!w> 2 z!2 )2 } 
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= K) 2 E EE^^) 2 K) 2 E { z m j)2 } E { z - fe)2 } 


k^j i m 


= 0(1/M 2 ). 


(203) 

(204) 


Thus, we see that the second term in ( l80l > vanishes relative to 
the first as M —toe. 

Appendix J 
Derivation of (1841) 

In this appendix, we derive (l84l >. Plugging (l40l > and ( 14 1 1 
into the second half of v J (t) from (|5H . we find 
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- 1 


^m(f) 


(208) 


where the random variable Z m above is distributed according 
to the pdf in d44l) . 
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